Back

Computers in Biology and Medicine

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Computers in Biology and Medicine's content profile, based on 120 papers previously published here. The average preprint has a 0.15% match score for this journal, so anything above that is already an above-average fit.

1
Computational Fluid Particle Dynamics-Informed Machine Learning Prototype for a User-Centered Smart Inhaler Enabling Uniform Drug Delivery to Small Airways

Zhang, Z.; Yi, H.; Kolanjiyil, A. V.; Liu, C.; Feng, Y.

2026-03-19 bioengineering 10.64898/2026.03.16.712264 medRxiv
Top 0.1%
22.9%
Show abstract

Small airways are the primary sites of airflow obstruction in chronic obstructive pulmonary disease. Effective delivery of aerosolized drug particles to these regions is crucial to maximize treatment efficacy while minimizing side effects. However, conventional inhalation therapy approaches (i.e., full-mouth particle release and inhalation (FMD)) typically result in insufficient drug deposition in the small airways and an uneven distribution across the five lung lobes. To address such deficiencies, the goals of this study are triple folds: (1) to develop a fast and accurate framework to secure target drug delivery (TDD) nozzle diameter and location based on the conventional computational fluid particle dynamics (CFPD)-FMD simulations, (2) to develop a CFPD-informed machine learning (ML) inverse-design framework that predicts optimal inhaler nozzle parameters based on patient-specific breathing patterns and drug properties, and (3) to demonstrate the feasibility of embedding this framework into a user-centered smart inhaler prototype to improve uniform TTD to the small airways across all five lung lobes. Specifically, a subject-specific mouth-to-generation-10 human respiratory system was employed, and 108 high-fidelity CFPD-FMD simulations were performed under varied physiological and design parameters, including tidal volume, particle diameter, release location, and release timing. Particle release maps generated from those CFPD-FMD simulations via backtracking identified optimal nozzle diameters and locations that promote uniform multi-lobe drug delivery while limiting off-target deposition. Accordingly, a dataset was compiled with inputs (i.e., flow rate, particle size, release z-coordinate, release time) and targets (i.e., nozzle center x- and y-coordinates, nozzle diameter). These inputs and targets form the CFPD-TDD dataset, on which 16 ML models were trained to learn inverse mapping from patient- and drug-specific inputs to optimal nozzle design parameters. Performance was evaluated using mean squared error (MSE) and mean absolute error (MAE) overall and per target feature. Parametric analysis using CFPD-FMD simulations was conducted to determine how patient-specific and drug-specific factors affect pulmonary air-particle transport dynamics and to explain why achieving CFPD-TDD in small airways with CFPD-FMD strategies remains challenging. Furthermore, the ML evaluation in this feasibility study demonstrated robust learning of the inverse mapping from patient-specific inputs to optimal nozzle parameters. Four top-performing models showed consistently low MSE/MAE across cases, and an ensemble (i.e., mixed model (MixModel)) combining their strengths was formulated. Independent CFPD-TDD simulations beyond the training and testing datasets were used as the ground truth to validate ML-predicted nozzle configurations. Compared with conventional CFPD-FMD strategies, ML-guided nozzle designs significantly improved inter-lobar deposition uniformity and reduced off-target deposition in the upper airways, demonstrating the feasibility of ML-enabled TDD to the small airways. Overall, this study establishes a CFPD-informed ML inverse-design framework as a viable algorithmic foundation for user-centered smart inhalers, enabling adaptive, patient-specific TDD to the small airways with improved deposition uniformity across all five lung lobes. By integrating first-principle-based CFPD with ML, this work provides a methodological pathway toward next-generation smart inhalers for more effective treatment of small airway diseases.

2
Agent-Based Modeling of Idiopathic Lung Fibrosis and Mechanistic Treatments

Gunputh, N. D.; Kilikian, E.; Miranda, C. A.; Peirce, S. M.; Ford Versypt, A. N.

2026-03-25 systems biology 10.64898/2026.03.22.713503 medRxiv
Top 0.1%
18.7%
Show abstract

Agent-based modeling (ABM) is a computational method for predicting the emergent outcomes of interacting, autonomous individuals in a complex system. Here, ABM is used to simulate interactions between fibroblast and myofibroblast cells during idiopathic pulmonary fibrosis (IPF) in alveolar tissue microenvironments. These microenvironments are derived from histology of a healthy human lung sample and moderate- and severe-IPF lung samples. Fibroblast differentiation, cell migration, and collagen secretion in response to the spatial distribution of the cytokine transforming growth factor-beta are captured in the ABM using NetLogo software. Results are presented from one simulated year without treatment and with mechanisms representing treatment by pirfenidone and pentoxifylline, alone and in combination. A total of 180 in silico experiments are run, analyzed, and compared in a high-throughput workflow. The effects of the initial number of fibroblasts and treatment scenarios on various metrics related to collagen accumulation and collagen invasion into alveolar regions are determined. The ABM and the analysis files are shared to facilitate model reuse. By integrating computational modeling of IPF and therapeutics, this research aims to improve understanding of fibrosis progression and assess the efficacy of novel and existing treatments targeting different mechanisms to inform decision-making for IPF treatment.

3
Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.

2026-06-01 health informatics 10.64898/2026.05.29.26354442 medRxiv
Top 0.1%
18.6%
Show abstract

Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.

4
A biatrial digital twin integrating electrophysiology, mechanics, and circulation: from physiology to atrial fibrillation

Pico-Cabiro, S.; Zingaro, A.; Puche-Garcia, V.; Lialios, D.; Vazquez, M.; Echebarria-Dominguez, B.; Izquierdo, M.; Carreras-Costa, F.; Saiz, J.; Casoni, E.

2026-03-16 bioengineering 10.64898/2026.03.12.711092 medRxiv
Top 0.1%
17.2%
Show abstract

Atrial electromechanics plays a key role in cardiac function by regulating ventricular filling and global hemodynamics, yet remains challenging to model consistently across scales. In this work, a multiscale atrial digital twin for simulations of normal and pathological atrial function is presented, formulated as an electromechanical framework for biatrial simulations that couples three-dimensional atrial electrophysiology and mechanics with a closed-loop zero-dimensional circulatory model. The framework is calibrated on a patient-specific biatrial anatomy to reproduce physiological regional activation times, atrial volumes, ejection fractions, and pressure-volume loop characteristics. The simulations capture all atrial functional phases throughout the cardiac cycle, including realistic figure-eight pressure-volume loops, an aspect hard to achieve in computational studies. A systematic sensitivity analysis quantifies the influence of active contraction, passive stiffness, boundary conditions, and circulatory parameters on atrial function. Finally, application to a pathological scenario through induced persistent atrial fibrillation demonstrates how electrophysiological remodelling propagates across scales, leading to loss of effective atrial contraction, altered atrioventricular flow patterns, and a clinically relevant reduction in cardiac output. Overall, this multiphysics and multiscale framework provides a robust platform to investigate how atrial electrical alterations drive mechanical and hemodynamic alterations in both healthy and pathological conditions.

5
Physics-Guided Deep Neural Networks: Correcting Physical Distortions in Protein Phase Separation Prediction

Wang, M.; Lu, T.; Song, Y.-h.; Li, y.

2026-04-21 cell biology 10.64898/2026.04.18.719364 medRxiv
Top 0.1%
15.3%
Show abstract

BackgroundIn computational biology, embedding known physical laws into deep learning models to construct "Physics-Informed Neural Networks" (PINNs) is a mainstream paradigm for enhancing model interpretability and extrapolation capability. However, in complex multi-physics coupling problems, there is a risk of competitive imbalance between the physical term and the flexible artificial intelligence (AI) residual term, causing the model to degenerate into a "black-box" fit and lose the original purpose of being physics-driven. MethodsIn this study, targeting the problem of predicting protein liquid-liquid phase separation (LLPS) behavior in response to environmental factors (temperature, salt concentration), we identified physical distortions, gradient vanishing, and numerical instability in the initial physics-AI hybrid model. Three core correction strategies were proposed: (1) Weight Allocation Logic Reconstruction: Force the physical trunk weight to 1.0 at the output layer, suppressing the AI residual term to the perturbation level of 0.05~0.1, ensuring physics dominance; (2) Robust Physics Formula Construction: Abandon the unstable power function and introduce a combination of Softplus and logarithmic functions to stably simulate the nonlinear effects of charge shielding; (3) Gain Compensation Alignment: Apply gain compensation to the weak signal branch (temperature) to ensure its effective participation in optimization. ResultsThe optimized model maintained a fitting accuracy of R2{approx}0.62 on the test set, while physical consistency was significantly enhanced. The model successfully restored the monotonic increase in solubility with temperature characteristic of UCST-type phase diagrams and correctly captured the nonlinear charge shielding features in the salt concentration response. The weights of key physical parameters (e.g., hydrophobic contribution w_h, net charge contribution w_ncpr) increased from <10-3 to the 10-2 magnitude, demonstrating the reactivation of the physical branch. ConclusionsThe weight control, formula stabilization, and signal gain alignment strategies proposed in this study effectively address the classic problem of "AI hijacking" physics in physics-AI hybrid models. This work provides a universal solution for constructing biophysical predictive models that combine high fitting accuracy with strong physical interpretability.

6
Predicting post-TEVAR endoleaks: a pre-operative hemodynamic risk factor from patient-specific Fluid-Structure Interaction simulations

Duca, F.; Tavarone, S.; Domanin, M.; Bissacco, D.; Trimarchi, S.; Vergara, C.; Migliavacca, F.

2026-03-18 bioengineering 10.64898/2026.03.16.712077 medRxiv
Top 0.1%
14.4%
Show abstract

Thoracic Endovascular Aortic Repair (TEVAR) is a minimally invasive procedure for the treatment of thoracic aortic pathologies, such as Thoracic Aortic Aneurysm (TAA). Computational simulations can provide valuable insights into TEVAR outcomes and complications prior to surgery, making them a useful tool in the procedural planning. In this work, Fluid-Structure Interaction (FSI) computational simulations are carried out in ten pre-TEVAR patient-specific TAA cases, for which post-TEVAR outcomes are known, to quantify the hemodynamic drag forces acting on the aortic wall. Based on these results, this study proposes a new risk factor R to predict the occurrence of type I and III endoleaks. The patient cohort is divided in a calibration set, used to associate specific R values with three different risk levels, and a validation set, to test the risk factor efficacy. Based on the risk factor values obtained for the calibration set, R[&le;] 0.33 is associated with low risk of endoleak formation, 0.33 < R[&le;] 0.67 with moderate risk, and R > 0.67 with high risk. Once it is applied to the validation set,the risk factor is able to predict the formation of a type Ia endoleak. The risk factor proposed in this work is capable of identifying all the endoleak cases analysed, as well as conditions known to increase the risk of TEVAR complications. This study represents a preliminary attempt to determine whether pre-TEVAR hemodynamics can effectively predict post-TEVAR complications and thereby aid clinicians in the pre-operative planning.

7
A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.

2026-05-26 health informatics 10.64898/2026.05.18.26352989 medRxiv
Top 0.1%
12.7%
Show abstract

Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.

8
Automatic deep learning-based segmentation and quantification of stented arterial cross-sections for morphometric analysis

Kraftberger, M.; Spirgath, K.; Haase, T.; Bandelin, R.; Meyer, T.; Jaitner, N.; Tzschätzsch, H.

2026-04-30 pathology 10.64898/2026.04.28.721259 medRxiv
Top 0.1%
12.6%
Show abstract

Arterial vascular diseases, such as atherosclerosis, are among the most serious global health threats. In preclinical studies, morphometric analysis of histological arterial cross-sections is considered the gold standard for assessing vascular remodeling and the effectiveness of therapeutic interventions. However, morphometric analysis is usually performed manually, which is time-consuming, subjective, and requires significant user interaction. This paper presents a fully automated, operator-independent framework for the precise morphometric analysis of stented arterial cross-sections, extending the previously developed qHisto (quantitative histology) framework for the quantification of various histological components. A neural network for the segmentation of arterial structures was trained and evaluated using 819 cross-sections. In addition, a quantitative analysis of vascular morphology, fibrin area, and lumen asymmetry was performed using 72 cross-sections from coated and uncoated balloons. The model achieved high segmentation accuracy with a median Dice similarity coefficient of 0.892-0.996. Compared to manual evaluation, the system reduces analysis time by 90%, enabling efficient processing of large datasets. Furthermore, morphometric analysis with qHisto showed significant differences between coated and uncoated balloons, e.g. regarding lumen area (AUC = 0.86) and fibrin ratio (AUC = 0.94). Our developed framework enables fully automated, comprehensive and standardized analysis of histological arterial cross-sections. This helps to reduce time-consuming, repetitive manual assessments and thus facilitates research of disease mechanisms and treatment effects in preclinical studies.

9
Mechanistic Insights into Skin Sympathetic Nerve Activity Dynamics in Healthy Subjects Through a Two-Layer Signal-Analytical and Closed-Loop Physiological Modeling Framework

Lin, R.; Halfwerk, F. R.; Donker, D. W.; Tertoolen, J.; van der Pas, V. R.; Laverman, G. D.; Wang, Y.

2026-04-13 health informatics 10.64898/2026.04.11.26350680 medRxiv
Top 0.1%
12.3%
Show abstract

ObjectiveSkin sympathetic nerve activity (SKNA) has emerged as a promising non-invasive surrogate measure of sympathetic drive, but its relevant physiological characteristics remain ill-defined. This observational study aims to investigate its regulatory patterns during rest and Valsalva maneuver (VM) in healthy participants. MethodUsing a two-layer strategy integrating signal analysis and physiological modelling, we analyzed data recorded from 41 subjects performing repeated VMs. The observational layer includes time-domain feature comparisons using linear mixed-effect models, and time-varying spectral coherence analysis. The mechanistic layer proposes a mathematical model to investigate whether baroreflex and respiratory modulation are sufficient to reproduce the observed HR and average SKNA (aSKNA) dynamics. Main ResultsMean integrated SKNA (iSKNA) showed more significant change than HRV for VM induced effects. We also found mean iSKNA increase during VM varies with BMI and sex. The coherence analysis indicated that iSKNA strongly synchronized with EDR under resting conditions. The proposed model successfully reproduced main characteristics of aSKNA dynamics, yielding a high median Pearson correlation coefficient of 0.80 ([Q1, Q3] = [0.60, 0.91]). In contrast, HR dynamics were only partially captured, with a median PCC of 0.37 ([Q1, Q3] = [0.16, 0.55]). These results likely suggest SKNA provides a more direct representation of sympathetic burst dynamics during VM in healthy subjects. SignificanceThis study provides convergent evidence that SKNA reflects known autonomic regulatory influences in healthy subjects. These findings strengthen the physiological interpretability of SKNA while clarifying its appropriate use as a practical biomarker of sympathetic function.

10
MOE-ECG: Multi-Objective Ensemble Fusion for Robust Atrial Fibrillation Detection Using Electrocardiograms

Peimankar, A.; Hossein Motlagh, N.; K. Khare, S.; Spicher, N.; Dominguez, H.; Abolghasemi, V.; Fujiwara, K.; Teichmann, D.; Rahmani, R.; Puthusserypady, S.

2026-03-30 health informatics 10.64898/2026.03.28.26349522 medRxiv
Top 0.1%
10.1%
Show abstract

Background: Atrial fibrillation (AFib) is the most common sustained arrhythmia in the world, imposing a heavy clinical and economic burden on global healthcare systems. Early detection of AFib can reduce mortality and morbidity, while helping to alleviate the growing economic burden of cardiovascular diseases. With the increasing availability of digital health technologies, computational solutions have great potential to support the timely diagnosis of cardiac abnormalities. Objectives: With the increasing availability of electrocardiogram (ECG) data from clinical and wearable devices, manual interpretation has become impractical due to its time-consuming and subjective nature. Existing automated approaches often rely on single classifiers or fixed ensembles that primarily optimize predictive accuracy while neglecting model diversity, which leads to limited robustness and generalization across heterogeneous datasets. Therefore, this study aims to develop a robust and diversity-aware framework for automatic AFib detection that simultaneously improves classification performance and model generalizability. To this end, we propose MOE-ECG, a multi-objective ensemble selection and fusion framework that explicitly optimizes both predictive performance and inter-model diversity for reliable AFib detection from ECG recordings. Methods: The proposed multi-objective ensemble (MOE) framework uses ensemble selection as a bi-objective optimization problem and employs multi-objective particle swarm optimization to identify complementary classifiers from a heterogeneous model pool. Unlike conventional ensembles, it explicitly optimizes both predictive performance and diversity and integrates Dempster-Shafer theory for uncertainty-aware decision fusion. After filtering the ECG signals to remove baseline wander and noise, they were segmented into windows of 20, 60, and 120 heartbeats with 50% overlap. The proposed approach was evaluated over five independent runs to assess its stability and generalization. Fifteen statistical and nonlinear features were obtained from the RR-intervals of the pre-processed ECG signals, of which eight features were selected with correlation analysis to capture subtle information from the ECG data. We trained and evaluated the performance of the proposed model in three open source databases, namely, the MIT-BIH Atrial Fibrillation Database, Saitama Heart Database Atrial Fibrillation, and Long-Term AF Database. Results: The proposed approach achieved the best overall performance on 60-beat segments, with an average accuracy of 89.85%, precision of 91.14%, recall of 94.19%, an F1-score of 92.64%, and area under the curve (AUC) of around 0.95. Statistical analysis using Holm-adjusted Wilcoxon tests confirmed significant improvements (p<0.05) compared to both the best individual classifier and the unoptimized average ensemble of all classifiers. These findings show that the proposed selection and evaluation methodology, rather than group aggregation alone, is the key driver of performance improvements. Conclusion: The results obtained demonstrate that the MOE-ECG model offers a robust, accurate, and reliable solution for the detection of AFib from short ECG segments. The empirical findings, in general, confirm that multi-objective ensemble fusion enhances diagnostic performance and offers robust predictions that will open up possibilities for real-time AFib detection in clinical and tele-health settings.

11
Enhanced precision of tensor electrocardiography through increased cumulative distribution function resolution: Validation in healthy individuals

TSUKADA, Y. T.; Hirayama, H.; Yodogawa, K.; Murata, H.; Iwasaki, Y.-k.; Fujino, T.; Shiozawa, A.; Tsukada, S.

2026-06-02 cardiovascular medicine 10.64898/2026.05.31.26354561 medRxiv
Top 0.1%
10.1%
Show abstract

Deep-learning ECG analysis is advancing rapidly but lacks stable, physiologically interpretable indicators to anchor explainable artificial intelligence (AI). Tensor cardiography (TCG) models electrocardiographic (ECG) waveforms as differences between pairs of cumulative distribution functions (CDFs), representing collective myocardial action potential transitions. However, the original 4-CDF model has limitations in fitting P waves and complex QRST patterns. This study aimed to evaluate whether increasing the number of CDFs from 4 to 10 improves TCG fitting accuracy and to characterize normative distributions of 10-CDF parameters in healthy individuals. Participants were recruited through occupational health screening at Tobu Railway Co., Ltd. (n = 415) and from the Nippon Medical School Hospital ECG database (n = 29). Standard 12-lead ECGs from 444 healthy participants, including 345 men and 99 women with a mean age of 46.9 years, were analyzed using TCG software. Reconstruction accuracy was assessed using RMSE, paired t-tests, and Cohens d. The 10-CDF model achieved significantly lower RMSE values across all leads than the 4-CDF model, with all p values < 0.0001 and very large effect sizes. In representative leads, RMSEs for the 4-CDF versus 10-CDF models were 0.0256 versus 0.0061 in lead II, 0.0230 versus 0.0063 in lead V1, and 0.0265 versus 0.0062 in lead V5. The coefficient of determination improved from a median of 0.952 with the 4-CDF model to 0.997 with the 10-CDF model in lead II. Parameter dispersion was reduced, suggesting improved estimation stability. Two new parameters, T_mean_diff and RT_mean_duration, were derivable from the expanded model; RT_mean_duration showed significant correlations with age and body surface area. In conclusion, increasing the CDF resolution from 4 to 10 significantly enhanced ECG waveform reconstruction accuracy and parameter stability. These findings provide normative distributions of 10-CDF TCG parameters and may support future explainable AI-based ECG analysis.

12
Benchmarking Clinical Reasoning in Large Language Models: A Comparative Assessment Study

Prade, T.; Samwald, M.

2026-03-15 health informatics 10.64898/2026.03.13.26347597 medRxiv
Top 0.1%
10.0%
Show abstract

Evaluation of Large Language Models (LLMs) and their clinical competence has mainly focused on conventional multiple-choice (MCQ) formatted medical question answering exams, yielding benchmarks like MedQA-USMLE, where models have already exceeded expert-level performance. However, alternative assessment methods have recently been proposed, such as SCT-Bench based on Script Concordance Testing (SCT), which evaluates clinical reasoning and probabilistic thinking under uncertainty. Reasoning-optimized models have unexpectedly scored worse on SCT-Bench despite outperforming non-reasoning models on other medical benchmarks. This study compared performance metrics, uncertainty proxies and clinical reasoning qualities between MedQA-USMLE and the public subset of SCT-Bench using instruction-tuned GPT-4.1, contrasting baseline and Chain-of-Thought (CoT) prompting across sampled responses. CoT prompts were designed to explicitly instruct the model to apply cognitive clinical reasoning strategies, with their usage subsequently evaluated across both benchmark formats. CoT prompting improved MedQA performance from 86.4% to 93.0%, while SCT-Bench score showed a non-significant decline from 77.7% to 74.7%. GPT-4.1 systematically overestimated the impact of new information under CoT, leading to overconfidence and increased extreme ratings on SCT questions. Sample-based majority voting significantly improved MedQA scores under CoT but had no meaningful effect on SCT-Bench. Response entropy analysis showed that CoT increased overall answer variability, while simultaneously clustering correct responses on MedQA, an effect absent on SCT-Bench. Calibration and ROC were substantially poorer on SCT-Bench than on MedQA, though CoT improved both on either benchmark. Qualitative analysis confirmed GPT-4.1 could apply situation-appropriate reasoning strategies and showed signs of metacognitive awareness about its own reasoning process, with expert rating patterns suggesting possible alignment with expert-like logic. These findings further corroborate limitations in elicited clinical reasoning for SCT-based benchmarking and suggest that reasoning-aware evaluation frameworks could contribute meaningfully to the medical AI benchmark landscape.

13
Data Matters: The Impact of Data Curation in the Classification of Histopathological Datasets

Brito-Pacheco, D. A.; Giannopoulos, P.; Reyes-Aldasoro, C. C.

2026-04-17 pathology 10.64898/2026.04.16.26351016 medRxiv
Top 0.1%
9.9%
Show abstract

In this work, the impact of outliers on the performance of machine learning and deep learning models is investigated, specifically for the case of histopathological images of colorectal cancer stained with Haematoxylin and Eosin. The evaluation of the impact is done through the systematic comparison of one machine learning model (Random Forests) and one deep learning model (ResNet-18). Both models were trained with the popular NCT-CRC-HE-VAL-100K dataset and tested on the CRC-HE-VAL-7K companion set. Then, a curation process was performed by analysing the divergence of patches based on chromatic, textural and topological features of the training set and removing outliers to repeat the training with a cleaned dataset. The results showed that machine learning models, can benefit more from improvements in the quality of data, than deep learning models. Further, the results suggest that deep learning models are more robust to outliers as, through the training process, the architectures can learn features other than those previously mentioned.

14
E-InfertilityTest: An Explainable AI Framework for Male Infertility Assessment

Das, G.; Ghosh, B.; Ghosh, Z.

2026-05-25 bioinformatics 10.64898/2026.05.21.726746 medRxiv
Top 0.1%
8.6%
Show abstract

Male infertility has emerged as a significant concern in modern society, with genetic defects as one of the major underlying cause behind it. This impairment negatively impacts sperm motility and morphology, leading to conditions such as Asthenozoospermia (reduced sperm motility), Teratozoospermia (abnormal sperm morphology) and sometimes Asthenoteratozoospermia (both motility and morphology defects). Assisted reproductive technologies (ART), such as in-vitro fertilization (IVF), offer a potential solution for such cases but with a low success rate. Classical semen analysis provides only a phenotypic snapshot without revealing the fertilizing potential of the sperms. Hence, in order to screen the functional sperm population as well as to get a deeper insight into the reasons underlying the aberrant sperm population, it is important to study their genetic profile. In this work, we have performed a meta analysis of the transcriptomic data of infertile sperms from Asthenozoospermia and Teratozoospermia patients with that from fertile sperms of normal individuals. Thereafter we have screened a signature gene set which has been used to develop a prediction model named Explainable Infertility Test (E-InfertilityTest) to classify between fertile versus infertile sperm at the preliminary level. For each prediction, it will also provide the set of genes which are playing a dominant role towards such prediction. Thus, it will provide patient specific dominant gene expression profile responsible for the aberration. This work warrants validation experiments in future to substantiate the models performance in a clinical setting. User can access the tool named E-InfertilityTest as a standalone version on GitHub. Github Linkhttps://github.com/zglabDIB/einfertility.git

15
Multi-Stain Fusion of Histopathology Images Using Deep Learning for Pediatric Brain Tumor Classification

Spyretos, C.; Tampu, I. E.; Lindblad, J.; Haj-Hosseini, N.

2026-04-14 pathology 10.64898/2026.04.10.717785 medRxiv
Top 0.1%
8.5%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe classification of pediatric brain tumors is investigated using deep learning on hematoxylin and eosin (H&E) and antigen Ki-67 (Ki-67) whole slide images (WSIs) from the Childrens Brain Tumor Network (CBTN) dataset. A total of 1,662 unregistered WSIs (1,047 H&E and 615 Ki-67 images) were analyzed, including low-grade glioma/astrocytoma (grades 1, 2) (LGG), high-grade glioma/astrocytoma (grades 3, 4) (HGG), medulloblastoma (MB), ependymoma (EP) and ganglioglioma. The The aim of this study was to effectively classify pediatric brain tumors using H&E and Ki-67 WSIs individually, and to investigate whether early, intermediate, and late fusion could improve the predictive performance. From each WSI, 224x 224 pixel patches were extracted, and the instance (patch)-level features were obtained using the histology foundation model CONCHv1_5. The instances were aggregated using clustering-constrained attention multiple instance learning (CLAM) for patient-level classification. Model interpretability and explainability was assessed through attention heatmaps, cell density and Ki-67 labelling index (LI) maps. In the binary grade classification between LGG and HGG, the intermediate concatenation fusion achieved the best performance with a balanced accuracy of 0.88 {+/-} 0.05, (p < 0.005) compared to the single-stain models (H&E: 0.84 {+/-} 0.05, Ki-67: 0.86 {+/-} 0.05). For the 5-class tumor type classification, the one-hidden layer late fusion learning model achieved the highest balanced accuracy of 0.83 {+/-} 0.04 (p < 0.005), outperforming the single-stain models (H&E: 0.77 {+/-} 0.05, Ki-67: 0.74 {+/-} 0.05). Overall, most of the fusion approaches outperformed the single-stain models in both classification tasks (p < 0.005). The Ki-67 attention maps demonstrated moderate to strong Spearman correlation ({rho} = 0.576 - 0.823) with the cell density and Ki-67 LI maps, suggesting that these features are associated with the models predictions, although additional features may contribute. The results show that H&E and Ki-67 images provide complementary information, and most of the multi-stain fusion approaches using deep learning improve pediatric brain tumor diagnosis.

16
Multi-task deep learning integrating pretreatment MRI and whole slide images predicts induction chemotherapy response and survival in locally advanced nasopharyngeal carcinoma

Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.

2026-04-11 radiology and imaging 10.64898/2026.04.07.26350350 medRxiv
Top 0.2%
8.4%
Show abstract

Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.

17
Vascular Deformation Mapping Calibration with Physics-based Synthetic Data on Multi-axial Aortic Motion

Kim, T.; Baker, T.; Burris, N.; Figueroa, A.

2026-05-22 bioengineering 10.64898/2026.05.20.726669 medRxiv
Top 0.2%
8.3%
Show abstract

Aortic stiffness is both heterogenous and anisotropic. Current non-invasive methods to estimate aortic stiffness are limited to characterizing the aortic tissue as isotropic due to the lack the techniques required to extract multi-axial strain from 3D dynamic images. Vascular deformation mapping (VDM) is a nonrigid image registration technique which has thus far been applied to map aortic growth using longitudinal imaging. In this study, we propose to use VDM to assess 3D aortic deformation by mapping diastolic and systolic images. During image registration process, penalty parameters are employed to fine-tune image alignment and penalize non-physiological deformations. These penalty parameters must be calibrated to ensure that VDM successfully reproduces multi-axial aortic motion patterns in health and disease. In this paper, we developed a calibration pipeline for these parameters using synthetic data. A rotation-free shell model was used to generate physics-based synthetic data on aortic motion incorporating patient-specific geometries, root motion, and blood pressure from a cohort of 14 subjects (healthy, Marfans syndrome and thoracic aortic aneurysm). An error metric was defined to quantify the quality of the VDM results. Furthermore, a k-means clustering technique was used to categorize the subjects into three clusters based on ascending aortic motion. Optimal penalty parameters were identified for each of the three clusters. The results indicated that patient clusters with smaller aortic root motion required larger rigidity penalty values. The calibrated parameters successively reduced errors in 3D displacement and multi-axial stretch compared to un-optimized VDM predictions, enhancing the accuracy of capturing aortic deformation from dynamic images. Among the different aortic regions, the ascending thoracic aorta exhibits the largest error reduction.

18
Interpersonal physiological synchrony: estimation and clinical application to cardiac dynamics of parent-infant dyads

Lavezzo, L.; Grandjean, D.; Delplanque, S.; Barcos-Munoz, F.; Borradori-Tolsa, C.; Scilingo, E. P.; Filippa, M.; Nardelli, M.

2026-03-23 bioengineering 10.64898/2026.03.19.712915 medRxiv
Top 0.2%
8.3%
Show abstract

Synchrony is a key mechanism that builds up the foundations of human interactions. Quantifying the level of physiological synchronization that occurs during dyadic exchanges is essential to fully comprehend social phenomena. We present a new index to characterize the coupling of complex physiological dynamics: the optimized Multichannel Complexity Index (opMCI). We validated this approach using synthetic time series of two coupled Henon Maps, with four different coupling levels in unidirectional and bidirectional manners. We demonstrated that the opMCI method allows to effectively discern between all coupling levels. Then, we applied the opMCI metric on heart rate variability data collected from 37 parent-infant dyads, during shared reading and playing activities, in the framework of the Shared Emotional Reading (SHER) project, with the aim of assessing the effects of early intervention in preterm babies. Two groups presented preterm infants: an intervention group, who participated in a two-month shared reading program, and a control group, who practiced shared play activities. A full-term group provided additional control data. The opMCI values were significantly higher for the intervention dyads with respect to the other groups during the shared reading task, showing that an early reading intervention program could increase parent-infant synchrony in preterm babies.

19
How can AI be compatible with evidence-based medicine?: with an example of analysis of lung cancer recurrence

Usuzaki, T.; Matsunbo, E.; Inamori, R.

2026-04-25 radiology and imaging 10.64898/2026.04.17.26351114 medRxiv
Top 0.2%
7.3%
Show abstract

Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.

20
Exploratory Assessment of Pulsed-Wave Doppler Representations of Lung Sounds Using Deep Learning: An In-Vitro Phantom Study

Saad, A. A.; Murthi, S. B.; Boctor, E. M.; Teeter, W. A.; Seam, N.

2026-06-10 respiratory medicine 10.64898/2026.06.09.26353787 medRxiv
Top 0.2%
6.9%
Show abstract

The increasing availability of portable ultrasound systems motivates exploration of novel approaches to respiratory signal assessment. In this in-vitro study, we investigate whether pulsed-wave (PW) Doppler ultrasound can capture structured spectral patterns from replayed lung sound recordings. Digitized respiratory sounds were replayed through a tissue-mimicking ultrasound phantom, generating 1,478 PW Doppler spectral images from recordings associated with healthy subjects and several externally labeled disease categories. Exploratory classification experiments using a ResNet-18 architecture demonstrated that these Doppler representations contain learnable differences under controlled conditions. These findings motivate further investigation into PW Doppler as a potential representation of respiratory acoustics.